Picture for Zhuowen Tu

Zhuowen Tu

Soft Tail-dropping for Adaptive Visual Tokenization

Add code
Jan 20, 2026
Viaarxiv icon

Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes

Add code
Jan 08, 2026
Viaarxiv icon

CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning

Add code
Dec 09, 2025
Viaarxiv icon

Real Deep Research for AI, Robotics and Beyond

Add code
Oct 23, 2025
Figure 1 for Real Deep Research for AI, Robotics and Beyond
Figure 2 for Real Deep Research for AI, Robotics and Beyond
Figure 3 for Real Deep Research for AI, Robotics and Beyond
Figure 4 for Real Deep Research for AI, Robotics and Beyond
Viaarxiv icon

C3Editor: Achieving Controllable Consistency in 2D Model for 3D Editing

Add code
Oct 06, 2025
Viaarxiv icon

VideoNSA: Native Sparse Attention Scales Video Understanding

Add code
Oct 02, 2025
Figure 1 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 2 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 3 for VideoNSA: Native Sparse Attention Scales Video Understanding
Figure 4 for VideoNSA: Native Sparse Attention Scales Video Understanding
Viaarxiv icon

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Add code
Aug 01, 2025
Figure 1 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 2 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 3 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 4 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Viaarxiv icon

DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion

Add code
Jul 30, 2025
Viaarxiv icon

AuthGuard: Generalizable Deepfake Detection via Language Guidance

Add code
Jun 04, 2025
Viaarxiv icon

Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels

Add code
May 20, 2025
Viaarxiv icon